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1 .  Rates  of  convergence  In  extreme  value  theory 

Let  F  denote  a  probability  distribution  function  and  suppose  there  exist 

constants  a  >0  and  b  ,  for  n>l,  and  a  non-degenerate  distribution  function  G 
n  n 

such  that 


lim  Fn(a  x  +  b  )  =  G(x). 

^  n  a'  v  ' 
n -*» 


(1.1) 


Then  G  may  be  taken  to  be  one  of  the  "three  types" 

A(x)  =  exp(-e  X), 
f  0  x  <  0. 


<t>  (x) 
av  ’ 


exp(-x  a) ,  x  >  0 
exp  (-(-x)“).  x  <  0, 


(a>0) 


(a  >  0) 


(1.2) 

(1.3) 


*  (x) 

av  ’ 


(1.4) 


^  1  x  >  0. 

Alternatively.  G  may  be  taken  to  be  of  "Generalized  Extreme  Value"  form 

G(x)  =  exp{-(  1+ttx)~1/t}  (1.5) 

where  y+  =  max(y.O)  and  -®  <  -r  <  «:  the  case  t  =  0  interpreted  as  the  limit 
t  -*  0.  which  is  (1.2).  The  range  of  the  distribution  in  this  case  is  the  set 

R^  =  (x:  1+tx  >  0), 

i.e.  { — r  * ,  °°)  if  t  >  0,  (-<»,t  *)  if  -t  <  0,  (-4®.00)  if  nr  =  0.  These  results  are 

well  known  and  we  refer  to  the  books  of  Galambos  (1978)  and  Leadbetter, 

Lindgren  and  RootzAn  (1983)  for  details. 

Interest  in  rates  of  convergence  started  with  the  very  early  paper  of 

Fisher  and  Tippett  (1928).  They  showed  for  normal  extremes,  that  the 

appropriate  limit  is  (1.2),  but  they  argued  that  a  "penultimate"  approximation 

within  the  family  (1.4)  is  better  in  practice.  In  the  context  of  (1.5),  this 

is  equivalent  to  saying  that  the  limiting  value  tt=0  is  better  replaced  by  a 

sequence  of  values  t  ,  where  i  T  0  as  n 

n  n 1 

The  modern  theory  of  rates  of  convergence  may  be  considered  to  have  begun 


with  the  works  of  Anderson  (1971,  1976)  and  Galambos  (1978,  Section  2.10). 

They  gave  general  formulae  for  computing  pointwise  rates  of  convergence.  Since 
then,  the  theory  has  developed  in  three  main  directions. 


I 

|  The  first  direction  has  been  towards  the  computation  of  explicit  upper 

bounds  for 

i  sup  I^V  +  b  )  -  G(x)  |  (1.6) 

,  x  n  n 

when  an.bn  are  chosen  appropriately.  Hall  and  Wellner  (1979)  obtained  the 

-1  -1  -2 

sharp  upper  bound  n  (2+n  )e  when  F  is  exponential,  and  Hall  (1979)  obtained 

the  bound  3(log  n)  *  when  F  is  normal,  both  with  G  =  A.  Davis  (1982)  combined 

the  Hall-Wellner  result  with  the  probability  integral  transform  to  obtain  a 

result  for  general  F,  but  it  requires  rather  detailed  computations  to  apply  it 

to  any  particular  case.  The  best  results  in  this  direction  have  been  obtained 

by  Resnick  (1986),  who  gave  general  results  assuming  essentially  the  von  Hises 

conditions,  introduced  in  Section  2.  An  interesting  alternative  approach, 

based  on  Zolotarev's  method  of  ideal  metrics,  is  given  by  Zolotarev  and  Rachev 

(1985),  though  this  is  currently  confined  to  the  <P  and  *  limits. 

a  a 

The  second  direction  of  study  stems  from  Anderson  (1971),  and  is  really 
more  concerned  with  the  structure  of  the  remainder  term  than  with  explicit 
bounds.  Smith  (1982)  derived  uniform  rates  of  convergence  to  $a  assuming  a 
"slow  variation  with  remainder"  condition 

F jt)1  *  *-“<l*°te(t»> 

for  each  fixed  x>0,  where  g(t)  «*  0  at  t  -»  «.  A  simple  transformation  allows 
this  approach  also  to  be  applied  to  ♦  .  Cohen  (1982b)  took  rather  a  similar 
approach  to  the  limit  A,  starting  with  the  de  Haan  (1970)  representation 
-log  F(x)  =  c(x)  exp  {-/£  1^-  dt)  (x  2  X) 

(c(x)  -»  Cj,  a(x)  -»  1 ,  f  differentiable  and  f ' (x)  -»  0).  As  was  pointed  out  by 
Anderson  (1984),  the  alternative  representation  with  a(t)  =  1,  due  to  Balkema 
and  de  Haan  (1972),  allows  some  simplification  of  Cohen's  results.  In  most 
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cases  this  approach  leads  to  improved  approximations  for  F11.  Rates  of 

convergence  of  the  penultimate  approximation  have  also  been  established  (Cohen 

_o 

1982a, b.  Gomes  1984),  the  normal  case  for  instance  being  of  0{(log  n)  }.  The 
two  directions  for  ♦  have  partly  been  brought  together  by  Omey  and  Rachev 
(1987). 

The  third  direction  of  study  concerns  the  extension  of  the  problem  from 
statements  about  (1.1)  or  (1.6)  to  more  general  convergence  criteria  involving 
the  joint  distribution  of  several  largest  order  statistics  and  convergence  of 
densities  instead  of  distribution  functions.  These  considerations  are 
especially  relevant  for  statistical  applications.  Reiss  (1981)  obtained  an 
asymptotic  expansion  for  the  distribution  of  the  k  largest  order  statistics 
from  the  uniform  distribution,  with  rates  of  convergence  (see  also  Kohne  and 
Reiss,  1983)  and  Falk  (1986)  extended  this  to  general  distributions  via  the 
probability  integral  transform.  This  would  appear  to  be  a  very  powerful 
approach,  though  Falk’s  conditions  are  not  easy  to  verify  in  particular  cases. 
Weissman  (1984)  took  a  different  point  of  view,  asking  how  fast  k  could  grow 
(as  a  function  of  n)  for  convergence  to  remain  valid.  Reiss  (1984)  pointed  out 
the  importance  of  Hel linger  distance  for  statistical  applications. 

The  present  work  is  aimed  at  partly  unifying  these  different  approaches, 
both  with  a  view  to  combining  the  results  for  the  three  domains  of  attraction, 
and  incorporating  the  approach  of  Reiss  and  Falk  within  the  general  scheme. 
Convergence  in  Hellinger  distance  implies  convergence  in  total  variation 
distance,  which  in  turn  implies  uniform  convergence  of  distribution  functions. 
Therefore  it  seems  to  us  that  Hellinger  distance  is  the  most  appropriate 
distance  measure  to  use.  The  usefulness  of  Hellinger  distance  in  statistical 
applications  is  explained  briefly  in  Section  3. 

The  structure  of  the  paper  is  as  follows.  Section  2  develops  the 
approximations  we  use.  The  emphasis  here  is  on  having  a  single  form  of 
improved  approximation  valid  for  all  three  types.  We  also  extend  the  notion  of 


penultimate  approximation.  In  Section  3.  proofs  of  convergence  in  Hel linger 
distance  are  given.  These  cover  both  the  classical  and  threshold  forms  of 
extreme  value  approximation,  and  are  expanded  also  to  cover  the  joint 
distribution  of  k  largest  order  statistics  (for  fixed  k).  Finally  in  Section  4 
we  give  numerical  examples  of  our  new  approximations,  demonstrating  that  they 
really  do  make  a  considerable  improvement  on  the  classical  extreme  value 
approximations . 


2.  Development  of  the  approximations 

Suppose  F  has  density  f(x)  =  dF(x)/dx  defined  on  the  range  (x^.x  )  where 
x^  =  inf{x:  F(x)  >  0}  >  -®,  x*  =  sup{x:  F(x)  <!}£». 


Then  we  may  write 


-log  F(x)  -  exp  {  *(,;)}•  xw  <  x  <  x 

(2.1) 

where 

*(x\  -  ~F(x)log  F(x) 
nx)  ~  f(x) 

(2.2) 

Sometimes  we  use 

the  alternative  representation 

l-F(x)  =  exp  (  *(t)  ).  xw  <  x  <  x 

(2.3) 

where 

1-F(x) 

'  ~  f(x) 

(2.4) 

Whichever  form  is  adopted,  we  shall  assume  t  Is  continuously  differentiable  and 


lim  ♦’(x)=t  (2.5) 

xtx 

for  some  real  t . 

Equation  (2.5)  is  one  form  of  the  well-known  von  Mises  conditions  which 
are  sufficient  though  not  necessary  for  the  domain  of  attraction  of  an  extreme 
value  distribution  (see  de  Haan  (1976)).  It  makes  no  difference  to  the  limit 
which  of  the  two  definitions  of  ♦  is  adopted,  and  the  limit  is  given  by  (1.5) 
with  the  same  *r.  The  precise  significance  of  (2.5)  has  been  given  by  Pickands 
(1986):  it  is  a  necessary  and  sufficient  condition  for  "twice-dif ferentiable" 
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■yvf*. 

E<1 

■CvfV 


convergence,  meaning  that  not  only  (1.1)  holds  but  also  convergence  of  the 
corresponding  densities  and  derivatives  of  the  densities.  Convergence  of 
densities  alone  has  been  studied  also  by  Sweeting  (1985),  following  de  Haan  and 
Resnick  (1982).  For  our  present  purposes,  convergence  of  densities  is  relevant 
but  our  main  motivation  for  assuming  (2.5)  is  mathematical  tractability. 

From  (2.1)  we  have 

- ds). 


-log  F(u  +  x»(u))  _  ,oc  . _ 

-log  F(u)  ***  *■  J0  $(u+s$(u) 


*(u) 

By  the  mean  value  theorem,  for  each  s 


»(u)^U^  =  1  +  -Tq  ♦'(“  +  *#(u))dw. 


(2.6) 

(2.7) 


■^--lTu)(u) 


where  y  is  between  u  and  u  +  s$(u).  Consequently 

rx 


0 


*(u) 


*(u  +  s$(u)) 


1  +  s*’(y) 


ds 


is  a  continuous  function  of  y,  takes  on  both  positive  and  negative  values  as  y 
ranges  from  u  to  u  +  x$(u)  (unless  is  constant),  and  so  is  zero  for  at  least 
one  y.  Substituting  in  (2.6), 

7'OS^l5i+F(!)U))  *  {l+x«,(y)}_1/^  (y)  (2.8) 

for  some  y  between  u  and  u+x$(u).  Now  let  us  define,  for  each  n  ]>  1,  b  such 

n 

that  -log  F(bn)  =  n  (well-defined,  since  F  is  continuous)  and  let  an  =  ^(b^). 
Substituting  u  =  bR  in  (2.8), 


F^^x  +  bn)  =  exp[-{l+XTfn(x)} 


-1/7  (X) 
nv  ' 


] 


(2.9) 


where  7^(x)  =  $'(y),  y  being  as  in  (2.8).  If  anx  +  b^  is  outside  the  range 


n 


n 


(x^.x  )  then  we  interpret  both  sides  of  (2.9)  to  be  0  or  1  as  appropriate. 

Now  suppose  (2.5)  holds,  and  let  x  be  a  fixed  number  in  the  range 
(recall  (1.6)).  It  is  easily  verified  from  (2.5)  that 

lim  =  7  (x*  =  «};  1  imM  =  -  7(x*  <  00 ) 


uh» 


.  *  * 

U  I  x  x  -u 


(2.10) 


M 

and  hence  that  u  +  x$(u)  f  x  uniformly  over  finite  ranges  of  x  as  u  f  x  . 
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;SI 


I 


I 


f 


i 


.■'I 
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1 


‘j 

¥ 


g 


si!l 


a 


s-a 


I 


sa 


a 


r'H 


St 


Thus  (2.9)  tends  to  (1.5)  as  n  -»  ».  This  provides  am  independent  proof  of  the 


sufficiency  of  (2.5)  for  (1.1).  but  in  a  form  particularly  well  suited  for  the 


machinations  to  follow. 


If  we  start  with  (2.3)  in  place  of  (2.1).  then  the  argument  is  the  same  up 


to  (2.8).  which  now  reads 


1  ~  i(-  f(u?(u)1  -  <»  +  ’‘♦'(y)F1/*’(y) 


(2.11) 


which  in  turn  implies  for  x  >  0  that 


1  -  F(u  +  x<ft(u] 


1  -  F(u) 


(1  +  xtt)  .  (*r  >  0  and  tt  <  0,  0  <  x  <  -nr  ) 


(2.12) 


.  (y  <  0.  x  >  -nr  ) 


This  is  the  Generalised  Pareto  distribution  introduced  by  Pickands  (1975), 


which  is  particularly  useful  as  a  model  of  excesses  over  high  thresholds.  Some 


statistical  applications  are  given  by  Smith  (1984,  1987),  Davison  (1984),  Joe 


(1987)  and  Hosking  and  Wallis  (1987). 


So  far  we  have  replaced  $'(y)  by  nr.  In  some  sense,  however,  what  we  are 


doing  is  expanding  the  tail  of  F  about  u,  so  it  may  make  more  sense  to 


approximate  $'(y)  by  $'(u).  This  is  especially  true  if  *r=0  for  then,  by  virtue 


tt  tt 

of  (2.10),  u  +  x$(u)  is  (for  fixed  x  as  u  |  x  )  much  closer  to  u  than  to  x  . 


Thus  we  replace  nr  in  (1.5)  by  Tn  =  ♦'(^n)*  y  *n  (2.12)  by  nr(u)  =  ^'(u).  The 


first  of  these  is  the  penultimate  approximation,  precisely  as  it  is  defined  by 


Gomes  (1984)  and  equivalently  to  the  definition  of  Cohen  (1982b).  Although 


Cohen  and  Gomes  both  prove  that  the  penultimate  approximation  is  better  in 


general  than  the  ultimate  approximation  (in  the  sense  of  giving  a  faster  rate 


of  convergence)  they  do  not  really  give  any  motivation  for  considering  it  in 


the  first  place.  The  foregoing  may  provide  some.  Moreover,  it  also  suggests 


that  we  could  do  the  same  thing  when  -r  /  0,  providing  a  penultimate 


approximation  in  this  case  also.  Some  of  the  evidence  given  later  will  suggest 


that  this  is  an  advantageous  thing  to  do.  So  far  as  we  are  aware,  this  is  the 


first  time  that  a  penultimate  approximation  has  been  suggested  when  -r  ^  0. 

If  we  want  to  go  beyond  this,  the  logical  next  step  in  view  of  (2.7)  is  to 
consider  an  expansion  of  $'(u  +  w$(u))  about  $'(u).  At  this  point,  however,  we 
interrupt  the  proceedings  to  give  some  examples.  These  will  serve  both  to 
illustrate  what  has  been  done  so  far,  and  to  motivate  the  next  step. 


Example  1  Suppose  x  =  +®  and 

-log  F(x)  =  Cx_“  (l+Dx_P  +  0(x-/?_£)}.  x  -►».  (2.13) 

where  C,  a,  p,  e  are  positive  constants  and  D  is  real.  This  includes  nearly 
all  practical  examples  in  the  domain  of  attraction  of  (1.3),  e.g.  Pareto, 
Cauchy,  t,  F.  We  assume  the  relation  (2.13)  is  twice  differentiable,  in  the 
sense  that  we  can  differentiate  term  by  term  without  affecting  the  order  of  the 
O-term.  It  follows  that 


♦■(«)-£♦  W;1*  *-g  + 


(2.M) 


Thus  t  =  a  *  and  the  rate  of  convergence  in  (1.1)  is  0($*(bn)  -  -r)  =  O(b^)  = 

—  R/rr 

0(n  )  as  in  Smith  (1982).  However,  in  the  case  P  =  1  the  second  term  in 

—•  1  /rt 

(2.14)  is  0  and  so  the  rate  of  convergence  is  o(n  ).  Smith  (1982)  showed 
the  conventional  approximation 

pn(V)  *UX>  (F(bJ  =  exp(-n'1)) 


n' 


achieves  0(n  ^a)  for  all  P  and,  though  a  way  of  reducing  this  to  o(n  ^a)  when 
P  =  1  was  proposed,  the  construction  is  artificial.  Incidentally,  the  rate  of 

—R/rr 

0(n  p  )  is  optimal  (amongst  all  choices  of  an*bn)  when  P  ?  1. 

Continuing  from  (2.14),  we  have  when  P  ?  1 

<fi'(u  +  x<p( u))  -  *'(u)  *■)  u  ^[{1  +  x  ^  -  1] 


~  u  P[{1  -i-  x<p'(u)}~p  -  1] 


(2.15) 


using  (2.5)  and  (2.10). 

If  we  start  with  l-F(x)  in  place  of  -  log  F(x)  in  (2.13),  then  the 


corresponding  results  hold  for  the  threshold  approximation  (2.12). 


Example  2  Suppose  x  <  00  and 


-  log  F(x)  =  C(x*-x)“  {1  +  D(x*-x)^  +  0((x*-x)^+e)} ,  x  f  x**  (2. 16) 


(C.a.p,  e  positive,  D  real)  and  that  this  relation  is  twice  differentiable.  If 


we  replace  F(x)  by  1  -  F(x  -x) .  this  includes  many  distributions  in  the  minimum 


domain  of  attraction  of  the  Weibull  distribution,  with  applications  to 


reliability  and  elsewhere.  In  this  case 


*  ^  (x*-x)P  +  0((x*-x)P+£) 


(2.17) 


so  -r  =  -a  and 


<A’(u  +  x*(u))-$‘(u) 


{(x*-u-x*(u))^  -  (x*-u)^} 


{(1  +  x*l(u))P  ~  1}. 


(2.18) 


In  this  case  the  rate  of  convergence  in  (1.1)  is  0(n  p  )  and  there  is  no 


possibility  of  improving  this  by  a  different  choice  of  a^  and  bn  (Smith  1982). 


Again,  if  we  start  with  1  -  F(x)  in  (2.16)  then  we  get  similar  approximations 


for  the  threshold  distribution. 


In  neither  example  so  far  have  we  emphasized  the  penultimate 


approximation,  but  numerical  evidence  of  its  efficacy  will  be  given  later. 


Example  3  Let  nr  =  0.  If  we  slightly  strengthen  the  conditions  for  what  Cohen 


(1982b)  called  Class  N,  then  it  is  valid  to  make  a  Taylor  expansion 


♦  '(u  +  x$(u))  -  ♦’(u)  ~  x*(u)*"(u) 


(2.19) 


Examples  include  most  well-known  distributions  in  the  domain  of  attraction  of 


A.  e.g.  normal,  log  normal,  gamma,  Weibull,  but  not  the  exponential  or  logistic 


distributions  for  which  decreases  exponentially  fast.  These  are,  in  fact. 


the  most  important  cases  to  which  the  theory  we  are  going  to  develop  does  not 


apply,  though  since  the  reason  is  essentially  that  the  convergence  occurs  too 


quickly,  we  would  argue  that  this  exclusion  is  not  of  importance  for 


statistical  applications. 
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It  is  not  obvious  how  to  combine  (2.15),  (2.18)  and  (2.19)  into  a  single 
general  formula.  We  shall,  however,  make  a  proposal.  Define  the  family  fo 
functions  on  0  <  x  <  “, 


xP  -  1 


■,  P  *  0 


hp(x)  =  *du  =  ■{ 


log  x,  p  =  0. 


(2.20) 


This  function  often  arises  as  a  remainder  term  in  the  theory  of  slow  variation 
(Smith,  1982,  Goldie  and  Smith  1987). 

We  assume  that  there  exist  real  c  and  p  and  a  non-negative  function  g, 


with  g(u)  -»  0  as  u  |  x  ,  such  that 


llm  »'(u){4>'(u  +  w»(u))  -  »‘(u)}  _  c 

ujx*  g(u)hp(l  +  w<*>’(u)) 


(2.21) 


for  each  w  e.  R^.  We  further  assume  that  $'(x)  is  non-zero  and  of  the  same  sign 
for  all  sufficiently  large  x  <  x  ,  and  that  p  is  either  0  or  of  the  opposite 
sign  to  <t>'  .  Examples: 


2 

Example  1  p  =  -J3,  g(u)  =  u  c  =  - ^  ^3  ~  ^  ■ 

a 

Example  2  P  =  P.  g(u)  =  (x*-u)^,  c  = - P  +  *1— , 

a 

Example  3  p  undetermined,  g(u)  =  $(u)  |$''(u)|,  c  =  ±  1. 


Example  3  relies  on  hp(l+w$’(u))  ^  w$‘(u)  as  $'(u)  -*  0.  The  fact  that  p  is 
undetermined  in  this  case  is  not  important,  since  the  results  we  derive  are 
independent  of  p  (in  this  case)  up  to  the  claimed  order  of  approximation.  Note 
that  we  also  allow  c  =  0,  so  the  P  =  1  case  of  Example  1  is  also  included, 
though  in  this  case  a  more  logical  approach  would  presumably  be  to  take  the 
next  term  in  the  expansion. 

Substituting  from  (2.21)  in  (2.7)  and  then  (2.6),  setting  u  =  b^  where 
-log  F(t>n)  =  n  1 ,  a^  =  ^(b^),  "r^  =  $'(bn),  rn  =  g(bn),  routine  manipulations 


■va.vi  ror  7. 


V  <•' 
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W 


1 

I 


9 

I 


1 


.v5i 


niu 

I'Jrl 

*7.1 


lead  to 


^(ax  +  V  =  exPC-(1+^n)+  n  i1  +  crnVx,nrn)^  +  °(rn)  (2-22) 


for  each  fixed  x,  where 


Hp(x,T7)  = 


h  (1+xr?)  +  ph^l+xr?)  -  (p+l)log(l+xr?) 


p(p  +  l)if 


1  +  XT?  >  0. 


(2.23) 


1  +  XT?  <  0. 


A  rigorous  derivation  of  (2.22)  will  be  given  in  the  next  section.  The  cases 


p  =  0,  p  =  -1  are  defined  by  taking  appropriate  limits  as 


H_1(x*t?)  = 


Log(  1+XT?)  +  log(l+> 

3 

V 


Hn(x.T7)  = 


J  !og2(l+XT?)  -  log(  1+xn)  +  1-(1+X7?)  1 


when  x  >  0.  Note  also  that 


lim  H  (x.t?)  =  g-  , 
t?h0  v 


(2.24) 


confirming  that,  in  the  case  -r  ->  0,  H  (x."r  )  in  (2.22)  may  be  replaced  with 

n  p  n 


x  /6  (independent  of  p)  without  affecting  the  claimed  rate  of  convergence. 


For  the  threshold  approximation  (2.12),  we  should  start  with  (2.3)  instead 


of  (2.1);  the  result  then  obtained  is 
1  -  F(u  +  x<Hu)) _ .. 


iU-*F(u)U))  =  {l  *  X^'(«)}+1/#,(U)  {l+cg(u)Hp(x.«'(u))>  (2.25) 


+  o(g(u)) 


for  each  x  >  0. 


3.  Hel linger  convergence 


Def ine 


F  (x)  =  F  (a  x  +  b  ) , 
nv  '  v  n  n7 


Gn(x)  =  exp[-(l+x7n)+  n  (l+crn  H  (x.^)}]. 


(3.1) 


In  (2.22),  we  asserted  that  |Fn(x)-Gn(x) |  =  o(rn)  for  each  fixed  x.  It  is 


natural  to  ask  whether  this  result  holds  uniformly  over  all  x. 


This  is  not  the  only  sense,  however,  in  which  the  closeness  of  F  and  G 

n  n 


could  be  measured.  Another  question  is  whether  the  densities  f  =  dF  /dx,  g  = 

n  n  n 

dG^/dx  converge  uniformly  at  rate  o(rn).  If  they  do,  then  it  follows  from  an 


easy  extension  of  Scheffe’s  Lemma  that 


sup  |/  f  (x)dn  -  S  g(x)dx|  =  o(r  ). 


(3.2) 


B  B 


where  the  supremum  is  over  all  Borel  sets  B.  This  is  the  mode  of  convergence 
used  by  Falk  (1986).  Another  measure  studied  by  Reiss  (1984)  is  Hel linger 


distance: 


urc  \  r  ncl/2t  \  1/2,  w 2  j  -i  1/2 

H^fn’Sn^  =  ^^fn  ^  “  gn  {x^  *0 


(3.3) 


If  H(fn>gn)  =  °(rn)  then  (3.2)  is  immediate. 


Equations  (3.2)  and  (3.3)  have  direct  statistical  interpretation.  For 
example,  if  B  is  the  rejection  region  of  some  test  calculated  under  the 
assumption  that  g^  is  the  correct  distribution,  then  (3.2)  says  that  the  error 
in  the  computed  probability  of  rejection  is  at  most  ofr^).  The  importance  of 
Hellinger  distance  arises  from  the  following  inequality,  pointed  out  by  Reiss. 


Suppose  we  have  N  independent  observations  from  each  of  and  g^,  and  let 


f(N).  g(N)  denote  the  resulting  joint  densities.  Then 


n  n 


g‘N))  i  hV. .*_)• 


'  n  ~n  y  '  n  n' 

Suppose  H(f  ,g  )  =  o(r  )  and  n  -»  00 ,  N  -»  “  such  that  Nr  is  bounded.  Then 
v  n  n  v  n'  n 


H(flN).  g^N))  -  0 


'  n  n 

so  that  the  total  variation  distance  between  f^^  and  g^^  is  asymptotically 
negligible,  i.e.  statistical  calculations  carried  out  as  if  gn  was  the  correct 
density  remain  valid  when  sampling  from  f  .  This  provides  an  alternative 
method  of  justifying  statistical  calculations  based  on  extreme  value 
approximations,  avoiding  the  awkward  moment-convergence  technicalities  of 
Goldie  and  Smith  (1987),  Smith  (1987),  Cohen  (1987a,  1987b)  and  Joe  (1987). 


The  main  additional  condition  needed  to  prove  Hellinger  convergence  is 
y  >  This  condition  is  easily  understood  statistically,  since  when  y  ^  ~  ^ 

the  problem  is  non-regular  and  standard  maximum  likelihood  techniques  fail. 
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Alternatives  in  these  cases  are  proposed  by  Smith  (1985,  1987). 

For  further  information  about  Hellinger  distance  and  total  variation 
distance  we  refer  to  Ibragimov  and  Has’minskii  (1981)  or  Section  4.2  of  LeCam 
(1986).  The  following  result  is  adapted  from  Theorem  7.6  of  Ibragimov  and 
Has’minskii : 

Lemma  3 . 1  Let  fQ(x;0)  denote  a  family  of  non-negative  functions  indexed  by 

2 

vector  parameter  0  e  0.  Let  gQ(x;0)  =  fQ(x,0)  with  gradient  vector  vg  with 
respect  to  0.  Suppose  f^.fg  are  two  such  that,  for  each  x  in  a  set 

B.  there  exist  ©^(x)  (1=1,2)  such  that  f^x)  =  f(x;  0j(x)).  Suppose  O^x)  e 
0*  C  0  for  each  x  e  B,  1=1,2.  Then 

i  I 

/(f2(x)-f2(x)}2dx  $  sup  |01(x)-02(x) |2  S  sup*  |vgQ(x, 0) |2dx.  (3.4) 

B  xeB  B  0e0 

Remark  3.2  This  differs  from  Ibragimov  and  Has’minskii  in  that  0^  and  @2 
depend  on  x;  i.e.  f  ^  do  not  have  to  be  members  of  the  family  fQ(x;0)  but  only 
close  to  it.  Finiteness  of  the  integral  in  (3.4)  is  closely  related  to  the 
boundedness  (over  0  )  of  the  trace  of  the  Fisher  information  matrix. 

Proof .  We  have 

1_  1_ 

fl(x)  “  {2^  =  go(x:  VX^  *  «0(X:  02^X^ 

=  Sq  {0j(x)  -  02(x)}T  vgo{01(x)  +  t(02(x)  -  O^x))}  dt. 

so  that 

I  I 

(f2(x)  -  f2(x)}2  ^  |0j(x)  -  02(x);2  sl0  |vgo{01(x)  +  t(02(X)  -  o^xmfdt. 

Now  just  integrate  with  respect  to  x. 

We  now  come  to  our  main  result. 


Theorem  3.3  Suppose  defined  from  (2.1).  satisfies  (2.5)  with  t  >  ,  and 

(2.21)  with  its  associated  conditions.  Define  bn  by  F(bn)  =  exp(-n  *),  a^  = 

A(b  ) .  i  =  $’{b  ),  r  =  g(b  ).  Define  F  ,G  by  (3.1)  with  associated  density 
n  n  n  n  n  nn 

f^.g^.  Suppose  there  exist,  for  each  u,  variables  s^(u)  >  0,  s2(u)  <  0  such 
that 

(i  +  ^-(u)s1(u)r1/^,(u) 

lim^  - 2 -  =  0.  (3.5) 

uTx  g  (u) 

exp[-{ 1  +  $‘(u)s9(u)} 

lim* - g - - - =  0.  (3.6) 

utx  g  (u) 

lim  g(u)max[{l  +  s$'(u)}P,  (1  +  s$(u)}  *,  log(l  +  s$'(u)}]  =  0  (3.7) 

utx 


uniformly  on  s  e  (s2(u),  s^(u)).  Define  c(u,x)  by 


c(u.x) 

and  suppose  also  that 


»'(uH4>'(u  +  x<fr(u))  ~  »‘(u)> 
g(u)  hp(l  +  x+'(u)) 


lim  c(u,s)  =  c 

utx 

uniformly  on  s  e  (sQ(u),  s.(u)).  Then  r  H(f  ,g  )  -*  0  as  n  -»  ®. 

£•  l  n  n  n 


(3.8) 


Remark  3.4  The  simplest  way  to  demonstrate  (3.5)-(3.8)  is  to  define  Sj,s2  by 

(1  +  ^’(lOs^u))-17*  ^  =  gK(u) , 
exp[-{ 1  +  ♦*(u)s2(u)}_1/^  =  gK(u) 

for  some  fixed  K  >  2.  and  then  to  show  that  (3.7),  (3.8)  hold  for  this  choice 
of  Sj.Sg.  For  (3.7),  considering  first  the  upper  limit  s  f  Sj,  we  have 

g(u )  (1  +  ♦'(u)s1(u)}6  =  g(u)1  K5*  ^ 
so  we  require  1-K6$'(u)  ^  6j  >  0  as  u  |  x  .  The  only  case  that  causes  any 
difficulty  is  when  *r  <  0  and  5  =  -1:  then  we  do  need  t  >  The  limit  as  s  -* 

s2  is  much  easier  since  (1  +  $'(u)s2(u)}  grows  only  logarithmically  in 

l/g(u).  Thus  (3.7)  follows. 

Now  let  us  consider  (3.8),  breaking  this  up  into  cases  -r  =  0,  y  >  0,  y  < 


3 


0.  For  t  =  0,  it  suffices  from  (2.19)  that 


* (u  +  sft(u] 
♦  ' '(u) 


1  uniformly  on  |s|  £  K  log  |$(u)$''(u)|  (3.9) 


for  some  K  >  2.  This  is  similar  to  several  conditions  in  Cohen  (1982b),  and  is 

M  —  \  * 

automatic  if  $''(x)  (in  case  x  =  “)  or  $*'(x  -x  )  (in  case  x  <  00 )  is 
regularly  varying.  All  of  Cohen's  "Class  N"  examples  satisfy  this. 

For  7  >  0,  assuming  (2.13)  it  follows  that  the  relative  error  in  (2.15)  is 
0(u  e)  if  x  >  0,  0{u^(u  +  x$(u))  P  e}  if  x  <  0.  We  must  therefore  show 


uP{u  +  ♦(u)s2(u)}  P  e  ->  0. 


(3.10) 


u  +  «(u)s2(u)  =  fife  {i  +  ^•(u)s2(u)}  +  u{l  -  ■  uffiu)  } 

=  0{u | log  g(u)  |  *  ^ }  +  0(ug(u) ) 
from  which  (3.10)  follows. 

For  7  <  0,  assuming  (2.16),  a  very  similar  argument  settles  (3.8)  as 
s  -*  s2  but  we  have  an  additional  complication  as  s  -»  s^  because  of  the 

if 

possibility  u  +  Sj(u)$(u)  >  x  .  This  is  most  easily  settled  by  defining  $'(x) 

m  ~~1 

to  be  7  whenever  x  >  x  .  h^(x)  to  be  -p  whenever  x  <  0  (assuming  p  <  0) . 

Then  it  is  easily  seen  that  (3.8)  holds. 

Thus  we  would  argue  that  (3.5)-(3.8)  are  reasonable  assumptions  which  hold 
in  most  examples,  after  excluding  certain  cases  which  have  been  noted  earlier. 


Proof  of  Theorem  3.3  First  we  show 


,,  ,  1  1 

si'')n)  9  9  2  2 

Cb  )  <fn<°>  '  «£<“»*  =  <bn» 


(3.11) 


later  extending  the  range  of  integration  to  (-00,00). 


We  may  write 


f  (x)  =  na  f(a  x  +  b  JF11  (a  x  +  b  ) 
nv  '  n  v  n  n'  v  n  n' 


♦(bn)  -log  F(anx  +  bn)  ^  -log  F(an  +  bn) 

-logF(bn)  «(anx  +  bn)  CXP  {_  -log  F(bn)  } 
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»<bn> 

♦(bn  +  x*(  bn) ) 


*(b  )ds 
**1-%  »(b  +  s»f 


*(b  )ds 


♦(bR  +  s^(bn) ) 


-  exp!~Jo~^F!  ♦ 


♦(bn  +  s*(bn)) 


-}]•  (3-12) 


By  (2.7). 


LH-t  s»(u))  =  i  +  s*-(u}  +  Js 
*(u)  -  i  +  S9  +  jQ 


c(u.w)g(u)h  (1  +  w^’(u)) 


♦  ’(u) 


=  1  +  s**(u)  +  — - /q  hp(l  +  w*‘(u))dw 

where  c^(u,s)  is  such  that  -»  c  uniformly  on  s  e  (s2(u),  s^(u)).  Evaluating 


u.s)g(u) 


the  integral  we  have 


[u  +  s<Mu 


=  (1  +  s$‘(u)}  1  + 


c^u.sjgfu) 

fM 


(1  +  s*'(u))P  -  (1  +  s^ ' (u) )  1 
♦'(u)p(p  +  1) 


s ( 1  +  S»‘(u) 
p 


Now  (3.7)  shows  that  this  is  of  form  (1  +  s$’(u)}(l  +  o(l))  uniformly  in  s,  so 


for  the  reciprocal  we  have 


c2(u.s)g(u) 


(1  +  s«0-(u))P  -  (  1  ♦  s^^u))'1  sfi  +  s^.fu))-l 


«'(u)p(p  +  1) 


where  c 2  is  another  function  such  that  c2(u,s)  -+  c  uniformly  on  (s2(u),  s^(u)). 
This  may  also  be  written 

T(U  ttsl(u))'  =  {1  +  ~  c2(u> s)g(u)Hp(s, ♦ ' (u) )  (3.13) 

where  IT  is  the  derivative  with  respect  to  the  first  component  of  H^. 

For  later  purposes,  it  is  also  convenient  to  write  (3.13)  in  the  form 


♦  (  u  +  s<*»(u)) 


=  (1  +  s*'(u)} 


c3(u,s)g(u)fT(s.*’(u)) 

1  +  c3(u,s)g(u)H  (s,4>'(u)) 


(3.14) 


where  c3  -»  c  uniformly;  this  is  equivalent  to  (3.13)  because  of  (3.7). 

Now  take  (3.13)  and  integrate: 

;0  »(uflU^(..))  =  log^  +  X*'(u)}  "  8(u)J?c2(u-s)H‘(s.#*(u))ds 


and  hence 

-x-js 

{1  +  g(u)c4(u.x)Hp(x,«*(u))}  (3.15 

where  c4(u,x)  is  yet  another  function  satisfying  c^  -»  c  uniformly  on 
(s2(u).s1(u)). 

Define  a  new  parametric  family  by 

f0(x:®r®2)  =  exp[-(l+xT))-1/l7{l  +  SjH^x.t])}]  . 

-1/n  -1  92^n^X,T^ 

(i  *  xn)  ""{i  *  *  x,)  -  ,Ve,H  (Zi)  ] 

where  the  parameters  tj,  which  we  shall  identify  with  y  ,  and  p  are  not  shown 
explicitly  as  parameters  of  fg. 

By  (3.12),  (3.14)  and  (3.15),  we  have 

yx)  =  f0<x;  c4(bn.x)rn.  c3(bn.x)rn). 

But  directly  from  (3.1)  we  have 

gn(x)  =  f0(x:  crn.  crn). 

1 

2 

We  have  therefore  set  everything  up  to  apply  Lemma  3.1;  we  let  g^  =  f^,  B  = 

if 

(s0(b  ).  s,(b  ))  and  take  0  to  be  some  small  interval  around  (cr  ,  cr  ) .  The 
n  i  n  n  n 

only  thing  to  show  is  that  the  integral  in  (3.4)  is  finite. 

Consider  first  what  happens  as  x  -»  s^(bn).  Note  that 


.  1=1,2. 


f0  30i 


As  x  -»  s^,  we  have  fg  ^  (1  +  xr?)  Here  we  use  to  denote  "is  the 

same  order  of  magnitude  as”  and  always  keep  in  mind  (3.7).  Consider  first 
r/  <  0.  We  have 


”  =  exp[-(l+XT])"1/l?  (1  +  e^tx.n)}]  [(1+XT?)  1/T>  Hp(x.T7) 


-  (1  +  xtj)  Hp(x.rj){l  +  01Hp(x,ri)}]  [(1  +  xn) 


-1  92Hp(x'T?) 

1  +  0OH  (x.n) 


»a*  v: 


t  .'1  .'■i 
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a 


pi 

&A 


the  dominant  term  in  which  is^(l  +  xn)~1/vr2.  Hence 


aSr 


96, 


which  is  integrable  as  1  + 
■)2 

log, 

shows  that 


“a* 

1 


XT7  ^  0  because  17  >  -5.  A  very  similar  calculation 


0 


ae9 


is  of  the  same  order  of  magnitude,  as  1  +  xrj  -»  0. 


Now  suppose  tj  >  0.  In  thi 
df 


s  case  s1  -*  ®  and  we  may  assume  pi  0.  Hence 
aoj-  X (i  +  *n)  1/11-1  log  (i  +  XT?). 

2 


a8, 


0 


96, 


X  (1  +  XT>)  1/T?_1  log2(l  +  xr,). 


Similarly  we  have 


9gr 


96, 


X(i  ♦  *n)'1/Trl. 


So  in  this  case  the  required  integrals  are  finite  for  each  t,  >  0  and  even 
uniformly  as  q  -*  0. 

Similar  calculations  may  b.  made  as  *  -  ,2_  but  ln  thls  „„  there  „  n0 

problem  because  everyth!,*  is  decaying  exponentially.  Hence  «  conclude  that 
the  integral  in  (3.4)  is  indeed  bounded,  so  we  deduce  (3.11). 


To  complete  the  proof,  it  will  suffice  from 


1 

m2. 


I 

2, 


Ssl  {f"(X)  "  gn(x)}2dx  =  •C/nW'*  -  2J*"  f„(x)*n(x)dx  ♦  /“^(xjdx 
to  show  that  1  Fn(s1(bn))  =  o(r2).  1  -  ^(s^b^)  =  o(r2),  and  similarly  that 

Fn(s2(bn))  ^  Gn<s2{bn»  are  each  ■  In  the  case  of  Gn>  these  results 

follow  directly  from  (3.5)  and  (3,6).  also  using  (3.7)  to  show  that  the  rn  term 
in  the  definition  of  Cn  may  be  ignored  for  the  purpose  of  this  comparison"  In 
the  case  of  Fn>  note  that  (3.15)  is  an  expression  for  -log  F^x  ♦  bj ;  using 

(3-5).  (3.6)  and  (3.7)  again,  the  result  follows.  With  this  the  proof  of  the 
theorem  is  complete. 


I 

2, 
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A  similar  result  is  obtained  for  threshold  convergence.  We  state  the 
following  without  proof: 

Theorem  3.5  Suppose  defined  from  (2.3),  satisfies  (2.5)  with  y  >  ~  and 


(2.21).  Define  Fu,Gu  by 

F  (x)  = 
uv  ' 


F(u  +  x»(u))  ~  F(u) 
1  -  F(u) 


(3.16) 


Gu(x)  =  1  -  (1  +  x^'fu)}"17*  ^[1  +  cg(u)Hp(x,*‘(u))]  J 
(x  >  0),  with  associated  densities  f^,  gu-  Defining  Sgju)  to  be  0.  suppose 
s^(u)  exists  such  that  (3.5),  (3.7)  and  (3.8)  are  satisfied.  Then 
g(u)H(fu.gu)  -»  0  as  u  f  x* 

For  the  k  largest  order  statistics  (k  fixed,  n  -»  «)  it  seems  impossible  to 

avoid  an  additional  error  term  of  0(n  *)  (cf.  Falk  1986).  This  does  not 

matter,  of  course,  if  nr  -*  which  is  usually  the  case  in  practice.  Also,  in 

n 

this  case,  it  does  not  matter  whether  we  start  with  (2.1)  or  (2.3)  as  our 
definition  of 

Theorem  3.6  Suppose  the  assumptions  of  Theorem  3.3  are  satisfied,  with  ♦ 

defined  from  either  (2.1)  or  (2.3).  Let  Y. .  i . £Y  .  denote  the  order 

statistics  of  a  sample  from  F,  and  let  x(n^  =  (Y  -  b  )/a  for 

i  n-i+i ■ n  n  n 

1=1.2 . k,  where  k  is  a  fixed  positive  integer.  Let  f n(x ^ . x^)  denote  the 


joint  density  of  X| 


Define 


def ined  when 


*n(x;0)  =  (1  +  x-rn)  “(1  +  eHp(x,^)}. 
k 

*n<Xl . Xk>  =J1<-'<'n(xi:Crn)}eXp{^Xk:Crn» 

x^ . ^x^,  1  +  Xj-r^  >  0  for  each  i.  Then 

H(fn.gn)  =  o(rn)  +  0(n-1 ) . 


Proof .  Assume  ♦  has  been  defined  from  (2.1).  We  have 

fn<xl . *  (n-k)!  1’1|,r,t(Vl*bn>1  ^Vk  *  bn> 

V- 
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Let  us  first  replace  this  by 


,  *  ,  mn^Vi  *  b-> 

VX1 . xk>  =  <■ 


F(a  x. 
v  n  i  n 


>-)■•->  Fn<anVbn>- 

*  r»  ' 


It  is  easy  to  see,  by  writing  the  likelihood  ratio  f  /f  in  terms  of  uniform 

n  n 

H  —X 

order  statistics,  that  H(f  , f  )  is  Ofn  ).  Define  4>  as  above, 

n  n'  '  '  rn 


fn(x:6)  -  -  ^(x;0)  =  (1  +  X7n)  -  T  -  eHfi(X.Trny 


0H'  (x.tt  ) 
pv  ny 


n‘ 


Now,  using  (3.14)  and  (3.15), 

0xi . v  - 


k 

=  n 

i=l 


H\) 

iogFC^+x^t^)) 

•  ^ 

logF(bn+xk^(bn)) 

+  xi^(bn)) 

log  F(bn) 

•exp- 

log  F(bn)) 

c3<Vxi>rn>Vxi:  c4(VXi>rn)}exp{'VV  W^n^  ’ 


We  also  have 


gn<xl . xk>  *  1^ffn(xi;cr„)Vxl:crn»exp<‘+n(xk:crn»' 

The  proof  is  now  similar  to  that  of  Theorem  3.3,  in  that  we  define  a 
2k-parameter  family 

f„<xr  -  V  8<I)'  e‘2)>  *  )Mf„(x,:9<2))Vxr«l1)>'xP(-+n(xk^SI)) 

with  parameters  0^^  =  (oj^ . O^^),  ^  =  apply  Lemma  3.1  with 

=  (s2(^n)’  si^n^  '  Pro°^  that  the  integral  in  (3.4)  is  bounded  is 


B 


similar  to  the  corresponding  proof  in  Theorem  3.3,  and  the  extension  from  B  to 
)( 

IF  is  also  similar.  With  this  the  proof  of  Theorem  3.6  is  complete. 

4.  Examples 

Three  examples  will  be  used  to  illustrate  the  foregoing  theory.  These  are 
normal  maxima,  lognormal  minima  and  minima  from  a  Gamma  distribution  with  index 
a>l.  The  last  two  are  treated  by  reflecting  about  the  origin  so  as  to  use  the 
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1 


a 


i 


1 


a 


u 


'<w 


¥• 

v 

:: 


$! 


theory  for  maxima.  All  three  examples  have  nrR  -*  “,  and  this  allows  us  to  make 


two  small  changes  in  the  procedures  without  affecting  the  claimed  rates  of 


convergence.  These  are  to  define  $  from  (2.3)  instead  of  (2.1),  and  to  define 


by  F(b^)  =  1-n  *  instead  of  exp(-n  *).  Since  we  are  involving  the  normal 


distribution,  we  use  to  denote  the  standard  normal  distribution  function  but 


keep  in  the  sense  in  which  it  has  been  used  throughout  the  paper.  The  normal 


-1/2  2 

density  will  be  written  $'(x)  =  (2rr)  exp(-x  /2), 


With  ^  and  bn  as  just  defined,  we  may  write 


an  =  *<bn>  =  <nf<bn»  • 


(4.1) 


y  =  $' (b  )  =  - 
n  v  ny 


♦<bn>f‘<bn> 


f(bn) 


(4.2) 


We  define  =  cg(bn) ,  which  is  taken  to  be  ♦(bn)^',(bn)  when  F  is  in  the 


domain  of  attraction  of  A.  In  this  case,  further  application  of  (4.2)  gives 


2  2,f ‘(x)  ‘ 


'n  n  n  nv  f(x) 


(4.3) 


Experience  has  shown  that  it  is  important  to  use  the  exact  constants;  even 


minor  variations  on  the  foregoing  scheme  upset  the  comparisons  to  follow. 


Normal  distribution  Take  F  =  0.  f(x)  =  #'(x)  and  so 


..-a*) _ x  rosii  ’  _  -i 

f(x)  x>  [  f(x)j  - 


We  define  b^  by  ^(b^)  -  1-n  ;  application  of  (4.1)-(4.3)  yields 


an  =  n  1(2w)1/2exp(b^/2). 


y  =  a  b  -1 , 
n  n  n 


2  2 
e  —  y  +  y  +  a 
n  n  n  n 


The  expansion  (1  -  <t>(x)}/<P' (x)  =  x  *-x  ^  +  3x  ^  .  .  shows  that  $(x)  x  *, 


-2  -4  1  /9 

♦  '(x)  ~  -x  ,  ♦(x)*’ ' (x)  ^  x  Since  b^  =  0((log  n)  }  we  have  that  the 


rates  of  convergence  of  the  ultimate  and  penultimate  approximations  are 


0{( log  n)  J).  0(( log  n)"2}. 


*y.v 
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Lognormal  distribution  Take  F(x)  =  <f>(-a_1  log  |x  | )  for  x  <  0,  where  a  >  0.  In 
this  case 

.  -1 i  i-l, „  ,-1/2  ,  floglxl)  , 

f (x)  =  a  I x  |  (2tt)  exp{ - 1 —  ^  — )  • 

2ct 

ff =  |xi_1{l  +  a"2log  |x|}. 

[ff(xj~]  =  'X*_2  ’  °  2  +  CT_2  loglX^‘ 

Hence  with  satisfying  #(Bn)  =  1-n  * .  we  have 

bn  =  -exp(-CTBn). 

an  =  n_1a|bn|(2Tr)1/2exp(B2/2). 

-r  =  -a  |b  |_1{1  -ct_1B  }  -  1. 
n  n  n1  1  nJ 

e  =Tf2  +  ”r  -a2|b|2{l-a2  +  a  2log|b  I), 
n  n  n  n‘  n*  1  n1' 

Since  *(x)  ~  a2|x|(-log|x|)  (x)  ^  o2( log |x | )  $(x)$' ' (x)  ~  a  ( log |x | ) 

-1  -1/2 

the  rates  of  convergence  are  0(Bn  )  -  0{(log  n)  }  for  the  ultimate 
_o  -3/2 

approximation.  0(Bn  )  =  0{(log  n)  }  for  the  penultimate  approximation. 


Gamma  distribution  Take  f(x)  =  |x|“  le*/r(a )  for  x  <  0.  where  a  >  1.  In  this 
case 


f(x)  -  TT 


30  with  b  satisfying  1-F(b  )  =  n 
n  n 

-1 


-1 


we  have 
-a+1 


an  =  n  1^1”“  exP(  lbn  I  )T(a)  . 


-r  =  a  <  fb  [_1(a-l)-l}  -  1. 
n  nl '  n'  v  ' 

P  =  1.  £n  =  2|bn|a-2(a  +  1)  1  . 

The  values  for  p  and  &n  follow  from  the  expansion  l-F(x)  = 

{a T(a)}  * |x|a{l-a(a+l)  ^ | x |  +  ...}  of  the  form  (2.16)  with  P  =  1.  D  = 

-a(a+l)  Then  we  take  p  =  P,  cg(u)  -  - |u |^D/32(/3+l)a  ^  as  in  Section  2. 

Figure  1  shows  the  exact  density  for  normal  maxima  with  n=100,  together 
with  our  three  approximations,  i.e.  (1.5)  with  t  =  0,  (1.5)  with  and 

(2.22).  All  three  approximations  are  close  to  the  true  density,  but  the  first 
approximation  is  perceptibly  the  worst  of  the  three,  and  the  third 
approximation  the  best.  Table  1  gives  more  details  of  the  exact  and  three 


approximate  distributions,  including  mean,  variance,  skewness  and  kurtosis  of 
each,  and  three  measures  of  discrepancy  between  the  approximations  and  exact 
densities:  the  uniform  or  Kolmogorov-Smirnov  distance  (1.6),  the  Hellinger 
distance  (3.3)  and  total  variation  distance  which,  in  the  same  notation  as 
(3.3),  is  calculated  by 

V(fn.  gn)  =  X{fn(x)  -  gn(x))+dx  .  (4.4) 

The  calculations  confirm  our  overall  claim  about  the  ranking  of  the  three 
approximations.  Also  shown  are  the  corresponding  calculations  for  the 
threshold  distribution,  i.e.  (2.12)  with  -r^O  (exponential  distribution),  (2.12) 
with  tt  =  -r  (2.25).  It  is  noticeable  that  the  first  approximation  is  very 
poor  when  assessed  by  skewness  and  kurtosis,  but  much  better  when  assessed  by 
the  other  criteria.  This  is  mainly  responsible  for  the  adverse  comments  made 
by  Fisher  and  Tippett  (1928),  who  took  skewness  and  kurtosis  as  their  main 
criterion  of  fit.  It  also  warns  of  the  danger  in  using  moments  for  statistical 
comparison. 

Figure  2  and  Table  2  show  corresponding  calculations  for  the  lognormal 

distribution  with  a  =  1,  n  =  250.  We  took  a  larger  sample  size  here  because  of 

the  poorer  overall  fit.  The  most  striking  thing  here  is  that  the  first 

approximation  is  very  much  worse  than  the  other  two.  Note  also  nr  =  -0.4422 

n 

-  a  long  way  from  its  limiting  value  •»  =  0. 

Figure  3  and  Table  3  are  for  the  Gamma  distribution  with  a=5.  n=100.  For 
the  first  approximation  in  this  case  we  took 

Fn(x)  ~  exp{-(x/bn)a).  x  <  0 

equivalent  to  the  classical  two-parameter  Weibull  approximation  usually  assumed 
in  this  situation.  Figure  3  shows  strikingly  how  poor  it  is.  The  other  two 
approximations  are  indistinguishable  from  the  true  density,  except  in  one  tail. 

Finally,  in  Table  4  we  give  calculations  for  the  normal  distribution  at 

sample  sizes  n=10m,  m=l . 5.  The  decrease  in  distance  from  approximate  to 

exact  agrees  very  well  with  the  theoretical  rates  of  decay,  of  0{(log  n)  *}, 
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-2  -3 

0{(log  n)  )}  and  0{(log  n)  },  for  the  three  approximations. 
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TABLE  1 


STANDARD  NORMAL  DISTRIBUTION,  n  =  100. 


LOCATION  CONST  =  2.3263;  SCALE  CONST  =  .3752;  GAMMA  =  -.1271 

RHO  =  1.0;  EPSILON  =  .0298 

I.  DISTRIBUTION  OF  SAMPLE  MAXIMA 


MEAN: 

VARIANCE : 

SKEWNESS : 

KURTOSIS: 

KOLM-SMIR  DIST: 

TOTAL  VARIATION  DIST 
HELLINGER  DIST: 


EXACT 

2.518 
.  184 
.429 
3.765 


1ST  APPR  2ND  APPR  3RD  APPR 


2.553 
.232 
1.298 
5.399 
.0272 
.0390 
.  083 


2.511 

.175 

.276 

3.309 

.0052 

.0065 

.026 


2.517 
.  187 
.422 
3.709 
.0027 
.  0041 
.011 


II.  THRESHOLD  DISTRIBUTION 


EXACT 

MEAN:  .344 

VARIANCE:  .097 

SKEWNESS:  2.529 

KURTOSIS:  6.302 

KOLM-SMIR  DIST: 

TOTAL  VARIATION  DIST: 
HELLINGER  DIST: 


1ST  APPR  2ND  APPR  3RD  APPR 


.  380 
.  141 
4.001 
9.001 
.0311 
.0311 
.073 


.338 

.088 

2.004 

5.246 

.0055 

.0055 

.026 


.345 

.098 

2.512 

6.129 

.0013 

.0014 

.009 


LOCATION  CONST  =  -.0705;  SCALE  CONST  =  .0238;  GAMMA  =  -.442 
RHO  =  1.0;  EPSILON  =  .0557 

I.  DISTRIBUTION  OF  SAMPLE  MAXIMA 


EXACT 

1ST  APPR 

2ND  APPR 

3RD  APPR 

MEAN: 

-.0635 

-.0563 

-.0639 

-.0636 

VARIANCE : 

.00053 

.00093 

.00050 

. 00053 

SKEWNESS : 

.  174 

1.298 

.226 

.208 

KURTOSIS: 

3 . 088 

5.400 

2 . 993 

3.121 

KOLM-SMIR 

DIST: 

.  1117 

.  0137 

.0053 

TOTAL  VARIATION 

DIST: 

.  1450 

.0151 

.  0062 

HELLINGER 

DIST: 

.325 

.085 

.  070 

II.  THRESHOLD 

DISTRIBUTION 

EXACT 

1ST  APPR 

2ND  APPR 

3RD  APPR 

MEAN: 

.0179 

.0248 

.0175 

.0179 

VARIANCE : 

.00016 

.00057 

.00014 

.00016 

SKEWNESS: 

.63 

4.01 

.44 

.55 

KURTOSIS: 

2.96 

9.01 

2.59 

2.76 

KOLM-SMIR 

DIST: 

.1262 

.0143 

.0051 

TOTAL  VARIATION  DIST: 

.  1262 

.0143 

.  0051 

HELLINGER 

DIST: 

.303 

.084 

.072 

TABLE  3 


GAMMA  DISTRIBUTION,  ALPHA  =  5,  n  =  100. 


LOCATION  CONST  =  -1.2791;  SCALE  CONST  =  .3222;  GAMMA  =  -.3147 
RHO  =  1.0;  EPSILON  =  .0381 

I.  DISTRIBUTION  OF  SAMPLE  MAXIMA 


EXACT 

1ST  APPR 

2ND  APPR 

3RD  APPR 

MEAN: 

-1.165 

-1.173 

-1.170 

-1.167 

VARIANCE : 

.  104 

.072 

.100 

.  105 

SKEWNESS : 

.0009 

.0646 

.0127 

.  0039 

KURTOSIS: 

2.834 

2.880 

2.715 

2.808 

KOLM-SMIR  DIST: 

.0539 

.0078 

.0031 

TOTAL  VARIATION 

DIST: 

.  1027 

.0117 

.  0066 

HELLINGER  DIST: 

.  155 

.060 

.039 

II.  THRESHOLD  DISTRIBUTION 


EXACT 

1ST  APPR 

2ND  APPR 

3RD  APPR 

MEAN: 

.2505 

.2142 

.2461 

.2504 

VARIANCE : 

.040 

.032 

.037 

.040 

SKEWNESS : 

1.055 

1.400 

.810 

.980 

KURTOSIS: 

3.687 

4.200 

3.210 

3.484 

KOLM-SMIR  DIST: 

.0758 

.0081 

.0032 

TOTAL  VARIATION 

DIST: 

.0785 

.0109 

.0040 

HELLINGER  DIST: 

.  100 

.059 

.037 

wmmmmMMmMm mmmm 


TABLE  4 


STANDARD  NORMAL  DISTRIBUTION 


DISTRIBUTION  OF  MAXIMA 


FIT  OF  THREE  APPROXIMATIONS  FOR  VARIOUS  SAMPLE  SIZES 


1ST  APPR  2ND  APPR  3RD  APPR 
SAMPLE  SIZE  10 


KOLM-SMIR  DIST: 

TOTAL  VARIATION  DIST: 
HELLINGER  DIST: 


.052 
.0789 
.  1404 


.026 
.  0350 
.1024 


.  028 

.0351 

.0876 


SAMPLE  SIZE  100 


KOLM-SMIR  DIST: 

TOTAL  VARIATION  DIST: 
HELLINGER  DIST: 


.0272 

.0390 

.083 


.0052 

.0065 

.026 


.0027 

.0040 

.011 


SAMPLE  SIZE  1,000 


KOLM-SMIR  DIST: 

TOTAL  VARIATION  DIST: 
HELLINGER  DIST: 


.01825 

.02595 

.0576 


.00236 

.00273 

.0122 


.00039 

.00067 

.0026 


SAMPLE  SIZE  10,000 


KOLM-SMIR  DIST: 

TOTAL  VARIATION  DIST: 
HELLINGER  DIST: 


.01368 

.01930 

.0433 


.00133 

.00157 

.0070 


.00017 

.00020 

.0011 


SAMPLE  SIZE  100,000 


KOLM-SMIR  DIST: 

TOTAL  VARIATION  DIST: 
HELLINGER  DIST: 


.  01092 
.01531 
.0345 


.00085 

.00101 

.0045 


.00009 

.00010 

.0006 


FI  HIRE 


1:  DENSITY  OF  SAJTLE  MAY  IMA  FOR  STANDARD  NORMAL  DISTRIBUTION,  n-IOO 


Top  to  bottom  at  x=2,6: 


T^p  to  bottom  at  x-3.3: 


2nd  approx. 
3rd  approx. 
Exac  t 

lot  approx. 

1st  approx. 
Exact 

3rd  approx. 
2..d  approx. 


FIGURE  2:  DENSITY  OF  SAITLE  MINIMA  FOR  LOGNORMAL  DISTRIBUTION,  a  =  1,  n  =  250. 


Top  to  bottom  at  x=0.5:  Exact 

2nd  approx. 
3rd  approx. 
1st  approx. 


3 

j 

j 

I 


FIGURE  3:  DENSITY  OF  SAITLE  MINIMA  FOR  GAMMA  DISTRIBUTION,  n  =  5,  n  =  100. 

The  1st  approximation  is  the  curve  visibly  removed  from  the  others;  the 

exact  density  and  the  2nd  and  3rd  approximations  are  virtually  indistingu: shable. 


